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1 Introduction 

Whence the Born rule? It is fundamental to quantum mechanics; it is the essen- 
tial link between probability and a formalism which is otherwise deterministic; it 
encapsulates the measurement postulates. Gleason's theorem [4] is mathemati- 
cally informative, but its premises are too strong to have any direct operational 
meaning: here the Born rule is derived more simply, from purely operational 
assumptions. 

The argument we shall present is based on Deutsch's derivation of the Born 
rule from decision theory [2]. The latter was criticized by Barnum et al [1], 
but their objections hinged on ambiguities in Deutsch's notation that have re- 
cently been resolved by Wallace [12]; here we follow Wallace's formulation. The 
argument is not quite the same as Wallace's, however. Wallace draws heavily 
on the Everett interpretation, as well as on decision theory; like Deutsch, he is 
concerned with constraints on subjective probability, rather than any objective 
counterpart to it. In contrast, the derivation of the Born rule that we shall 
present is independent of decision theory, independent of the interpretation of 
probability, and independent of any assumptions about the measuring process. 
As such it applies to all the major foundational approaches to quantum me- 
chanics. 

We assume the conventional scheme for the description of experiments: an 
initial state, measured observable, and set of macroscopic outcomes. Given a 
description of this form, we assume there is a general algorithm for the expecta- 
tion value of the observable outcomes (the Born rule is such an algorithm) . The 
argument then takes the following form: for a particular class of experiments 
there are defmite rules for determining such descriptions, based on simple opera- 
tional rules, and theoretical assumptions that concern only the state-preparation 
device, not the measurement device. These rules imply that in general such ex- 
periments can be described in different ways. But the algorithm we are looking 
for concerns the expectation value of the observed outcomes, so applied to these 
different descriptions, it must yield the same expectation value. Constraints of 
this form are in fact sufficient to force the Born rule. If there is to be such an 
algorithm, then it is the Born rule. 
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2 Multiple-Channel Experiments 



The kinds of experiments we shall consider are limited in the following respeets: 
they are repeatable; there is a clear distinction between the state preparation 
device and the detection and registration device; and - this the most important 
limitation - wc assume that for a given state-preparation device, preparing the 
system to be measured in a defmite initial state, the state can be resolved into 
channels, each of which can be indcpcndcntly blocked, in such a way that when 
only one channel is open the outeome of the experiment is deterministic - in the 
sense that if there is any outeome at all (on repetition of the experiment) it is 
always the same outeome. We further suppose that for every outeome there is 
at least one channel for which it is deterministic, and - in order to associate a 
defmite initial state with a particular region of the apparatus - we suppose that 
all the channels are recombined prior to the measurement process proper. 

For an example of such an experiment that measures spin, consider a neutron 
interferometer, where orthogonal states of spin (with respect to a given axis) 
are produced by a beam-splitter, each propagating along different arms of 
the interferometer, before being recombined prior to the measurement of that 
component of spin. For an example that measures position, consider an optical 
two-slit experiment, adapted so that the lensing system after the slits first brings 
the light into coincidence, but then focuses it on detectors in such a way that 
each can receive light from only one of the slits. It is not too hard to specify 
an analogous procedurc in the case of momentum; 2 any number of familiar 
experiments can be converted into an experiment of this kind. 

We introduce the following notation. Let there be d channels in all, with 
D < d possible outeomes Uj € U, j — 1, D. These outeomes are macroscopic 
events (e.g. positions of pointers). Let M denote the experiment that is per- 
formed when all the channels are open, and M k , k = 1, d the (deterministic) 
experiment that is performed when only the k th channel is open. Let there be 
identifiable regions ri, r%, ... of the state-preparation device through which the 
system to be measured must pass (if it is to be subsequently detected at all 
- regardless of which channels are open). Call an experiment satisfying these 
specifications a múltiple- channel experiment. 

One could go further, and provide operational definitions of the initial states 
in each case, but we are looking for a probability algorithm that can be applied 
to states that are mathematically defined (so any operational definition of the 
initial state would eventually have to be converted into a mathematical one): 
we may as well work with the mathematical state from the beginning. 

3 Models of Experiments 

Turn now to the schematic, mathematical descriptions of experiments. Our 
assumptions are conventional: we suppose that an experiment is designed to 

2 The conventional method for preparing a beam of charged parteilcs of definite momentum 
(by seleeting for defiection in a magnètic ficld) can be adapted quite simply. 
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measure some observable X on a complex Hilbert space H , which for conve- 
nience we take to be of finite dimensionality; 3 we suppose that the apparatus is 
prepared in some initial state tp, normalizcd to one, 4 and that on measurement 
one of a finite numbcr of microscopic outcomes Xk € Sp(X) results, k = 1, d 
(we allow for repetitions, i.e. for some j ^ k we may have Xj = Xk)- We suppose 
that these microscopic events are amplified up to the macroscopic level by some 
physical process íl : Sp(X) — ► U, yielding one or other of the D possible dis- 
played outcomes Uj G U. We suppose the latter macroscopic events occur with 
probabilities pj, j — 1, .., D. 5 

We take it that the details of the detection and amplification process are 
what are disputed, not that there is such a process, nor that it results in macro- 
scopic outcomes Uj . The probabilities computed from records of repeated trials 
concern in the first instancc these rcgistered, macroscopic outcomes, not the 
unobservable microscopic events Afe (indecd, on some approaches to founda- 
tions, there are no probabilistic microscopic events, prior to amplification up 
the macroscopic level). To kccp this distinction firmly in mind - and the distinc- 
tion between the sets U and R - we shall not assume (as is usual) the numer- 
ical equality of íí(Afc) with A^; we do, however, assume that the macroscopic 
outcomes Uj € U are physical numerals, so that addition and multiplication 
operations can be defined on thcm. For convenicnce we assume that none of 
these numerals is the zero. 

Call the triple ^ip, X, íl^ an experimental model, denoto g. This scheme 

extends without any modification to experiments where there are inefhciencies 
in the detection and registration devices, so long as they are the same for every 
channel. (A morc sophisticated scheme will be needed if the emciencies diffcr 
from one channel to the next, however; we neglect this complication here.) 

This scheme applies to a much wider variety of experiments than multiple- 
channel experiments; the Born rule is conventionally stated in just these terms. 
We shall be interested in algorithms that assign real numbers to experimental 
models, interpreted as expectation vàlues, i.e. weighted avcragcs of the quanti- 
ties Uj, with weights given by the probabilities pj of each Uj, j = Í,...,D. We 
are therefore looking for a map V : g — > R of the form: 

D D 

V[^X,Çï]=Y,PiU h 5> = 1. (1) 

If D — d we can write the Uj's directly in terms of the í2(Afc)'s. Otherwise 
definc \~ l {uj) — {k : fï(Afc) = Uj}, j = 1, and choose any real numbers 

3 It would be just as easy to work with Hilbert spaces of countably inifnite dimension, 
and restrict instead the observables to self-adjoint operators with purely point spcctra. (The 
difficulty with observables with continuous spectra is purely technical, however.) 

4 Latcr on we shall consider the consequences of relaxing the normalization condition (cor- 
respondingly, we use the term "state" loosely, to mean any Hilbert space vector defined up to 
phase) . 

5 In the case of the Everett interpretation, we say rather that all of the macroscopic out- 
comes result, but that each of them is in a different branch (with a given amplitude). (We 
will consider the interpretation of probabilty in the Everett interpretation in due course.) 
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Wk € [0, 1], k = 1, such that X^fceA- 1 ^) Wfc = Pj- F rom Eq.(l) we obtain: 

d d 

V[iP,X,Sl]=^2w k Sl(X k ), ^tüfc = l. (2) 
fc=i fe=i 

Conversely, given any d real numbers Wk € [0, 1] satisfying Eq.(2), define the D 
numbers pj = SfceA -1 ^-) Wfc > fr° m Eq.(2) we obtain Eq.(l). 

In what follows, we assume the existence of probabilities pj satisfying Eq.(l), 
and therefore that there are real numbers Wk satisfying Eq.(2). The latter will 
prové more convenient for calculations. 

4 The Consistency Condition 

Our general strategy is as follows. In the special case of multiplc-channel exper- 
iments, there are clear criteria for when an experiment is to be assigned a given 
model. There follows an important constraint on V: for if M is assigned two 
distinct models g, g 1 , and if there is to be any general algorithm V : g — > R, then 
the expectation vàlues it assigns to these two models had better agree, i.e. V(g) 
= V(g'). We view this as a consistency condition on V. Failing this condition, 
expectation vàlues of models could havc no uncquivocal experimental meaning. 
The probabilistic outcome events Uk € U that we are talking of are all observ- 
able; it is the mean vàlues of these that the quantities V(g) concern; if one and 
the same mean value is matched to two expectation vàlues, V(g) ^ V(g') 1 then 
cither the experiment cannot be modelled by g and g', or there is no algorithm 
V for mapping models to expectation vàlues. 

That a condition of this kind played a tàcit rolc in Deutsch's derivation 
was recognized by Wallace; it was used explicitly in Wallace's deduction [12] of 
the Born rule, although there it was cast in a slightly different form, and the 
conditions for its use were stated in terms of the Everett theory of measurement 
(including the theory of the detection and registration process) . Here we make 
do with operational criteria, and with assumptions about the behavior of the 
state prior to any detection events; we suppose that this prior cvolution of the 
state is purely deterministic, and governed by the unitary formalism of quantum 
mechanics. 6 

Considcr a multiplc-channel experiment M. By assumption, there are d de- 
terministic experiments Mfe, k = l,...,d that can also be performed with this 
apparatus, on blocking every channel save the fc th , each yielding one of the D 
macroscopic outeomes Uj E U. Given that the initial state in region r for M k 
is ip k , it is clear enough, on operational grounds, as to what can be counted 
as a model for this experiment: the experiment measures any X such that 
Xip k = Afeí^fc, for any and any Q such that fi(Afc) € U is the outcome of Mfe. 

6 Of course in its initial phases the process of state preparation will involve probabilistic 
events, if only in collimating partides produced from the source, or in blocking particular 
channels. But it does not matter what these probabilities are; all that matters is that if a 
particlc is located in a given region of the apparatus, then it is in a definite state, and unitarily 
develops in a definite way (prior to any detection or registration process). 
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Now considcr the indetcrministic experiment M with every channel opcn. 
We suppose that the state of M at r is tp — J2k=i c k^k\ tncn tne observable 
measured is any X such that Xip k = X k ip k f° r k = 1, d, and any fi such that 
íí(Afc) € U is the outcome of each M k . 

Let us state this as a definition: 

Deflnition 1 Let M have d channels and D outcomes. Then M realizes X, íl 
if and only if 

(i) for some region r and orthogonal states {<fi k }, (p k is the state of M k in 
r, k = 1, d > D, and ip = Y^i=i c feVfc * s ^ e state of M in r, 

(ii) Xip k = X k if k , k = 1, d, 

(iii) fi(Afe) is the outcome of M k , k — í,...,d. 

The definition applies equally to a deterministic experiment (the limiting case 
in which d = D = 1). Bearing in mind that from our definition of multiple- 
channel experiments, for each Uj E U, there is at least one M k for which Uj is 
deterministic, it follows from (ii), (iii) that X has at least D distinct eigenvalues. 

Why is it right to model experiments in this way and not some other? The 
deterministic case speaks for itself; in the indetcrministic case, the short an- 
swer is that it is underwritten by the linearity of the equations of motion. An 
apparatus that deterministically measures each eigenvalue of X, when the 
state in a given region of the apparatus is <p k , will indeterministically measure 
the eigenvalues X k of X, when the state in that region is in a superposition of 
the <p k s. This principle is implicit in Standard laboratory procedurcs; this is 
how measuring devices are standardly calibrated, and how their functioning is 
checked. 

The consisteney condition now reads: 

Deflnition 2 V is consistent if and only if V(g) = V(g') whenever g and g' 
can be realized by the same experiment. 

In the deterministic case evidently: 

v[ip k ,\ k p<p k ,n] = n(\ k ). (3) 

We will show that if \ip\ = 1 and V is consistent, with (, ) the inner product on 
H, then 7 

V[i/>,X,n] = {>l>MX)il>) ( 4 ) 

Eq.(4) is the Born rule. 

We begin with some simple consequences of the consisteney condition. The 
Born rule is then derived in stages: first for equal norms in the simplest possible 

7 Whilst íl(X) makes no sense as an operator (as the vàlues of Cl are physical numerals 
likc pointer-positions, not real numbers) we are assuming that arithmctic operations can be 
defined for the Q(Afc)'s; define < tp, Cl(\ k )P Vk tp > = Q(X k ) < tp,P Vk tp > accordingly, and 
extend by linearity. 
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case of a spin half system; then for the general case of equal norms; and then 
for rational norms. The general case of irrational norms is handlcd by a simple 
continuity condition. As promised, we shall also derive a probability rule for 
initial states normalized to arbitrary finite numbers. 



5 Consequences of the Consistency Condition 

We prové four general constraints on V that follow from consistency. (Eqs.(5)- 
(8) may be found in Wallace [12], derived on somewhat different assumptions.) 
In each case an equality is derived from the fact that a single experiment 
realizes two different models: by consistency, each must be assigned the same 
expectation value. 

We assume it is not in doubt that there do exist such experiments, in which 
the initial state (prior to any detection or amplification process) evolves unitarily 
in the manner stated. 

Lemma 3 Let V be consistent. It follows 

(i) for invertible f : R —> R: 



v[^,x,n] = v[i>,f(x),nof- 1 }. (5) 

(ii) For orthogonal projectors {Pk}, fe = 1, d, such that PkPj — ^kj^j 

d d d d 

v[^c kVk ,^x k P Vh ,n] = v[Y l c kVk ,Y i x k p k ,n]· (6) 

fe=l k=l k=l k=l 

(iii) For Ug : tp k — > e l9k tp k , k = 1, d, for arbitrary 9 k e [0, 27r] C R 

d d 

V\i,,Y,^kP Vk M = V[Ue^,Y, x kh k M- (7) 
fe=l fe=l 

(iv) For : ip k — > <p n ^, where n is any permutation of < l,...,d > 

v&,x,n] = v[ü v ip,ir- 1 (x),n}. (8) 

Proof. Let g = (ïp,X,Clj be realized by M with d channels. Then for somc 

region n the state of M k is (p k , k = 1, .., d, that of M is X^fe=i c kfk-> an< ^ there 
exist (not necessarily distinct) real numbers Ai^.^Afe such that Xip k = X k (p k , 
Cl({X k }) = U. Sincc for invertible /, íï[/- 1 (/(A fc )] = fi(A fc ), f(X)<p k = f(X k )<p k , 
M realizes (ip, f(X), Cl o an d (i) follows from consistency. Further, M re- 

alizes any other model (ip,Y,Clj such that Ytp k = X k (p k ; J2k=i ^kPk is such 
a Y, so (ii) follows from consistency. Suppose now that ip evolves unitarily to 
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the state Ugip in region r^. Then in T2 the state of each is e t9k ip k , and since 
P e i» kíPk = P Vk , M realizes (Ueip,Ylk=i ^kP<p k ,QJ, and (iii) follows from consis- 

tency. Finally, let ip subsequently evolve to the state U^tp in region r%. Then in 
r 3 the state of each M k is ^Vm, and the state of M is X)fe=i c k í P 7r (k)- Without 
loss of generality, we may write X as Ylk=i ^ k Pp k > tncn 7r_1 (^) = Sfe=i ^fc 
^ w satisnes 7r_1 (^) ( ^7r(fc) = A fe^7r(fe)> so M realizes (u^tp,^- 1 ^),^, and 
(iv) follows from consisteney ■ 

Eqs.(5)-(8) are of course trivial consequences of the Born rule, Eq.(4). Note 
further that in each case the observables whose expectation vàlues are identificd 
commute - these are constraints among probability assignmcnts to projectors 
belonging to a single resolution of the identity. Finally, note that the normal- 
ization of the initial state tp played no role in the proofs. 

6 Case 1: The Stern-Gerlach Experiment for 
Equal Norms 

Consider the Stern-Gerlach experiment with d = D = 2. Let X = \P+ — \P- = 
a z (in conventional notation), the observable for the z-component of spin with 
eigenstates ip ± , and let tp = c + ip + + C-íf_. Let interchange <p + and ip_, so 
JJ^aJJ^ 1 = —o z . From Lcmma 3(iv) it follows that: 

V[c+tp + + c-tp_,a z ,ÇÏ\ = V[c+tp_ +C-ip + ,-d z ,Ú\. (9) 
From Eq.(9) and Lemma 3(i): 

V[c + ip + + C-ip_,a z ,Q] = V[c + ip_ + c-tp + ,a z ,Ct o -I] (10) 

(where (fi o —I)(x) = fl(—x)). From Eq.(10), in the special case that |c + | 2 = 
| c | 2 , and using Lemma 3(iii) to compensate for any differences in phase: 

V[c+íf + + c-ip_, a z ,Ú\ = V[c + ip + + c_tp_,a z ,tto -/]. (11) 

Consider the LHS of this equality. From Eq.(2), writing wi = w, W2 = 1 — w, 
íl(±i) = fi(±) - so that íí(+) results with probability w, and fi(— ) results with 
probability 1 — w) - we obtain the expectation value x = wQ(+) + (1 — w)Sï(— ). 
But by similar reasoning, the RHS yields wQ(— ) + (1 — w)Q(+) = —x + Í2(+) + 
Í2(-). Equating the two, x = + O(-)]. 

We have shown, for |c + | 2 = |c_| 2 : 

V[c +V+ + c^_,a z ,íl} = Í0(+) + Í0(-) (12) 

in accordance with the Born rule. Note that here we have derived an expectation 
vàlues in a situation (dimension 2) where Gleason's theorem does not apply. 
(Note that the normalization of the initial state tp is again irrelevant to the 
result.) 
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7 Case 2: General Superpositions ofEqual Norms 



Considcr an arbitrary observable on any d— dimcnsional subspace Hd of Hilbert 
space. By the spectral theorem, we may write X — X^fe=i Afc-P^,, for some 
set of orthogonal vectors {<p k }, k = 1, ...,d spanning Hd, where there may be 
repetitions among the Afc's. Let ip be a (not-necessarily normalized) vector in 
Hd', thcn for some rf-tuple of complex numbers < Ci, ...,Cd >, ip = X)fe=i c kf k - 
For any permutation w, we have from Lemma 3(iv), (i): 

d d d 

VfiTc k <p k ,X,n] =V[Y / c k ^ ík) ,TT- 1 (X),n} = V^c k v <k) ,X,Çlon\. (13) 

fe=l fc=l k=l 

If |cfe| 2 = |cj| 2 , j,k = l,...,d, using Lemma 3(iii) as before to adjust for any 
phasc differences 

v[ip,x,n} = v[ip,x,noTT}. (w) 

Let < iüi, Wd > be a d-tuplc of non-negative real numbers satisfying Eq.(2). 
From Eq.(14): 

d d 

w fc^(Afc) = Y w k^(K(k))- (15) 

k=l fe=l 

Eq.(15) holds for any permutation; let ir interchange j and k, and otherwise act 
as the identity. There follows 

WjQ(\j) + w k Q(\ k ) = w k Ct(Xj) + Wjíï(Xk). (16) 

Conclude that if íï(Aj) ^ íï(Afc) then w k = Wj (recali that by convention ^ U, 
so fi(Afe) is never zero). 

If D = d, evidently w k = Wj for all j,k = 1, d. Since Y^t w k = 1j w k = 2> 
k = 1, ...,d. Therefore 

i d 

^,X,íï] = -[^í2(A fc ). (17) 
fe=i 

If not, suppose fi(Aj) = fi(Afc) for j, fc = 1, ...,6 < rf. (If 6 = d Eq.(17) follows 
trivially.) For any j, k such that b < j < d, k < b, fï(Afc) =^ Q(Xj), from 
which we conclude as before that = Wj . Note further that under the stated 
conditions, l/d= \c k \ 2 (%2j =1 | c ^ | 2 ) 1 . We have proved 

Theorem 4 Let tp = J2k=i c kVki where \c k \ 2 — \cj\ 2 for all j, k = 1, ...,d. Then 
ifV is consistent 

d d ^112 

v[J2 E x k n Vk , n] = £ J , |2 °( A fc)- ( 18 ) 
/t=i fe=i fe=i 2^j=i l c jl 

Like Lemma 3, Theorem 4 is independent of the normalization of ip. 
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8 Case 3: d=2 Normalized Superpositions with 
Rational Norms 



The idea for extending these methods to treat the case of unequal but rational 
norms is as follows: consider an experiment in which the initial state ip evolves 
deterministically so that each component ip k entering into the initial superpo- 
sition with amplitude Ck evolves into a superposition of Zk orthogonal states of 
equal norm l/^/z^, such that \c^j ^fz^? is constant for all k. One can then show 
that the experiment has a model in which the initial state is a superposition of 
states of equal norms, so Theorem 4 can be applicd. (Evidcntly for this to work 
each |cfc| 2 will have to be a rational number.) 

For simplicity, consider first the case d = 2 for real amplitudes. Let ip = 
/™ li + 7^=j Voi where m and n are integers. Let X — AiP^ + \oP w ■ We 

will show that if V is consistent, V[ip,X,íl] = ^jíí(Ai) + ïï ^í7(A 2 ).Let the 
deterministic experiments of M be Mi,M 2 , with registered outeomes íí(Ai), 
Í2(A 2 ) respectively. Let the initial states of M, Mi,M 2 in region n be V, 

<Pi,<P2 respectively. Then M realizes g\ — (jp,X,Q^ . Now let ip evolve to 

Üip in region r 2 , where Üíp x = ^ J2T=i Xfc, &</>2 = 7= Efc=„+i Xfc, for somc 
orthogonal set of vectors {Xfc}, k = 1, m + n. Denote AiPç + ^2Pf) lp2 by 
y. Then the initial state of Mi, i = 1, 2 is Uipi in r 2 , whilst that of M is 
c\Uip 1 + C2UÍP2 in r 2 ; since YUip i = \iUip i it follows that M realizes 52 = 
Uip,Y ,íïj. By consisteney, V^gi) = V(g2)- Now define Pi = Ai5^fcL 1 P Xjb , 
P2 = A 2 Efeím+i -Pxfc ! since -fW^ = àkjUípj, k,j = 1,2, by Lemma 3(ii) it 

follows v[^,r,íï] - v[üi>,\iPi + a 2 p 2 ,o]. But &v = 7=HEfcírx fe ; 

applying Theorem 4 for d = m + n, and noting that O(Afc) = Ai for k = 1, to, 
and A2 otherwise, the result follows. 



9 Case 4: General Superpositions with Rational 
Norms 

The argument just given assumed ip was normalized to one. The Standard 
rational for this is of course based on the probabilistic interpretation of the state, 
and hence, at least tacitly, on the Born rule. It may be objected that we are 
only able to derive the dependence of the expectation value on the squares of the 
norms of the initial state, because this is put in by hand from the beginning. But 

i |2 

this suspicion is unfounded. Suppose, indeed, only that j^p = ^. As before, 
define Üip 1 = J2T=i Xfc, Ucp 2 = T,ttZ+i Xfc-The state Üip in region r 2 
will have whatever normalization ij) had in n; the states Utfi i = 1,2 will be 
eigenstates of Pi, as before; Defmition 1,2 will apply as before. Conclude that if 
V is consistent, V[ip, X, O] = V[Uip, \\Pi + A 2 P 2 , fi\, as before. The difference is 
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that now Utp = £r=i Xk + % YZZ X k = jk LET X k = % E k t? X k 
(adjusting the phases of c\ and c 2 , using Lemma 3(ii), as required). Evidently 
we have an initial state which is a superposition of n + m components of equal 
norm, m of which yicld outcome íl(Ài) and n of which yield outcome ÍÍ(A 2 ). 



SinCC m = l c i I n _ W2 1 

n+m |ci| 2 + |c2| 2 ' n+m |ci| 2 + |c2| 2 



V^MP^+MP^Ü] = |ci| ] C ^ C2|2 [0(A 1 )]+ |ci| ] C ^ c2|2 [0(A 2 )]. (19) 

Evidently the normalization of tp is irrelevant. 
This result is worth proving in full generality: 

Theorem 5 For each i,j = 1, d let Ci e C satisfy \q\ > 0, j^W € Z. Then 



V (E Ck ^ E , "] = E ^r 1 , , 2 ( 2 °) 



^ I 12 

El Cfe 

fe=i fe=i fe=i Z^j=i 



Proof. For {c k } as stated, there exists c G C, z k £ Z, 0fe £ [0, 27r], A; = 1, 
smc/i í/iaí Cfe = ce - k \fzk- Let m k ,n be integers such that z k = fc = l,...,rf; 
let {Xj}i 3 — 1, s be an orthonormal basis on an s— dimensionat subspace of 
Hilbert space H s , where s = Et=i m « ( we ma y su PP ose f or 3 = í,—,d, Xj = 
ifj). Define Ü on H s by the action Üip k = ^L= Ejl*m fc +i let ? k = ^Ü Vk > 
k = 1, d. Let tp = EÍ=i c W k 'i M realize g\ = Efc=i ^kP<p k , / í^en 
/or some region n , í/ie initial state of M is and the state of each M k is ip k 
with outcome O(Afc). Leí i/ie sfaíe o/M aí r 2 6e C/^; í/ien M also realizes g 2 = 
^V'iEaLi AfePfe,íí^, ancí fey consistency V(g\) — V r (.g 2 ). Swí construction 

d d m fe+ i d i0k m k + 1 

fe=l fe=l v j=m k + l fe=l j'=m fc +l 

so v[^£Li^Er=ml+À^] - ^Efe=iX s ,Eti AfeEr^l+i^^] 

("òt/ Lemma 3(ii)). The result follows from Theorem 4- (of s equiprobable out- 
comes, m k have outcome fï(Afc), so V(g) — ^ Efe=i "ifc^(Afc). 5mí m^/s = 

Examination of the proof shows that the dependence of probabilities on 
the modulus square of the expansion coeficients of the state ultimately de- 
rives from the fact that we are concerned with unitary evolutions on Hilbert 
space, specifically an inner-product space, and not some general normed lin- 
ear topological space. A general class of norms on the latter is of the form 

(d \ 

INI = (Efc=i l£fcl p ) , 1 < p < oo (d may also be taken as infinite). Such 
spaces (l p spaces) are mètric spaces and can be complctcd in norm. The proof 
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as we have developed it would apply equally to a theory of unitary (i.e. invertible 
norm-preserving) motions on such a space, yielding the probability rule 



(assuming that j^L· € Z, j, k = 1, d). But the only space of this form which 
is an inner-product space is p = 2 (Hilbert space). 



There are a variety of possible strategies for the treatment of irrational norms, 
but the one that is most natural, given that we are making use of operational 
criteria for the interpretation of experiments, is to weaken these criteria in the 
light of the limitations of realistic experiments. In practise, one would not ex- 
pect precisely the same state to be prepared on each run of the experiment. 
Properly speaking, the statistics actually obtained will be those for an ensem- 
ble of experiments; correspondingly, they should be obtained from a family of 
models, diffcring slightly in their initial states. We should therefore speak of 
approximate models (or of models that are approximately realizcd) - where the 
diffcrences among the models are small. 

How small is small? What is the topology on the space of states? The 
obvious answer, from a theoretical point of view, is the norm topology. We 
should suppose that for sufhcicntly small e, so long as \ip — ip \ < e, then if 



ip,X,íl) is an approximate model for M then so is (ip',X,Cl). Indced, X 



and Í7 will likewise be subject to small variations. (Only the outeome set U 
can be regarded as precisely specified, insofar as outeomes are identified with 
numerals.) 

But now it is clear that the details are hardly important; any algorithm that 
applies to families of models of this type, yielding expectation vàlues, will have 
to be continuous in the norm topology. Given that, the extension of Thcorem 
5 to the irrational case is trivial. We definc: 

Deflnition 6 Let he any sequence of models (ip^, X, , i = 1,2, ...such 
that lim —^|=0. Then V is continuous in norm if lim V( g^) = V(g). 

We may finally prové: 
Theorem 7 Let V be consistent and continuous in norm. Then for any model 





10 Case 5: Arbitrary States 






V[^,X,Ü} = 



(23) 
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Proof. It is cnough to prové that any realizable model satisfies Eq.(22). If 
rcalizable, there is some multiple-channcl experiment M with d channels and 
D outeomes that realizes (ip,X,Çlj. Lct {(p k }, k = l,..,d be any orthog- 

onal family of vectors such that Xtp k — Xkfk ( not au tne ^fe's need be dis- 
tinct). Without loss of generality, let tp — Ek=i c k<Pki X = Efc=i ^kP Vk - Lct 
< e = J2Í=i l Cfe l 2 anc ^ ^ í c fe'} — C d be any sequence of d-tuples such that 
6 — EaLi \ c ^k l 2 > rf*Ti2 c ^' um c ^k = Cfe (such a sequence can always be found). 

\c k | i^oo 

Let V W = ELi^Vfc, 5 W - (^ (í) ,X,ri). By Theorem 5, T/[V> W ,A,ÍÍ] = 
Eti^SfpW = vj^p ^ti^Afe) (^,P yfc V W )- The numer- 
ator is Efc=i ^{^k)P<p k )Í>^^ (by the continuity of thc inner product), 

; since the denominator is bounded below by e > 0, with 

lim £i=i |cf| 2 - Eli I 2 , and since lim Ü(X)^) = U,Q(X)i/;) 

(again by the continuity of the inner product), the result follows from the con- 
tinuity of V ■ 

A similar proof can be given for a general probability rule on V spaces, p ^ 2 
(i.e. Eq.(22), for arbitrary complex coeficients; of course this result could not 
be expressed as in Eq.(23), using an inner product). 

Is a continuity assumption permitted in the present context? Gleason's 
theorem does not require it; if one is going to do better than Gleason's theorem, 
it would be pleasant to derive the continuity of the probability measure, rather 
than to assume it. But from an operational point of view continuity is a very 
natural assumption: no algorithm that could ever be used is going to distinguish 
between states that differ infinitesimally. 



11 A Role for Decision Theory 

Dcutsch [2] took a rather different view: he was at pains to establish the Born 
rule for irrational norms, without assuming continuity. His method, however, 
was far from operational: along with axioms of decision theory, he assumed that 
quantum mechanics is true (under the Everctt interpretation). 

A hybrid is possible: the present method can in fact be supplemented with 
axioms of decision theory, yielding the Born rule for irrational norms, without 
any continuity assumption. But as Wallace [12] makes clear, nothing much 
hangs on this question. One can do without a continuity assumption, but therc 
are just as good reasons to invoke it from a decision theoretic point of view 
as from an operational one. In neither casc is therc any reason to distinguish 
between states that differ infinitesimally. 

Decision theory is important for a rather different reason: it is because 
the non-probabilistic parts of decision theory (as Deutsch puts it), or decision 
theory in the face of uncertainty (as Wallace puts it) can províde an account of 
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probability in terms of something else. This matters in the case of the Everett 
intcrpretation; according to many, the Everett interpretation has no place for 
probability [7]; given Everett, probability cannot be taken as primitive. 

So it is clear why Deutsch took the more austere line: if Everett is to be 
believed, quantum mechanics is purely deterministic. Deutsch supposed that 
the fundamental concept (that can be taken as primitive) is rather the value or 
the utility that an agent places upon a model - that V(g) is in fact a utility. 
He argued that experiments should be thought of as games; for each registered 
outcome in U, we are to associate some utility, fixcd in advance. So, in effect, 
the mapping Q : Xk ^ í^(Afe) G U defines the payoff for the outcome A^. 

Decision theory on this approach has a substantial role. If we suppose that 
the utilities of a rational agent are ordered, and satisfy very general assumptions 
("axioms of rationality"), a representation theorem can be derived [10] which 
defines subjective probability in terms of the ordering of an agent's utilities. In 
effect, one dcduces - in accordance with these axioms - that the agent acts as 
if she places such-and-such subjective probabilities on the outcomes of various 
actions. 8 

It is important that one can still make sense of uncertainty in this context, 
as Wallace explains. It may be we cannot help ourselves to probabilistic ideas ah 
initio, but that does not mean that one only deals with certainties - that games, 
in some sense, have only a single payoff, as Deutsch at one point suggests [2, 
p. 3132-3.] From a first-person perspective, one does not know what outcome of a 
quantum game to expect to observe (there is certainly no first-person perspective 
from which they can all be observed). In fact, it is enough that - in the face of 
branching - a rational agent expects anything at all (that she does not expect 
oblivion [9]). 

On this line of thought, the proofs of the Born rule just presented make 
an illegitimate assumption: Eq.(l). We are not entitled to assume that the 
macroscopic outcomes Uj € U, j = Í,...,D occur with probabilities pj, for 
they all occur; so neither can we assume there are non-negative real numbers, 
summing to one, satisfying Eq.(2). But the proof of Theorem 4 (hence 5 and 7) 
dependcd on this assumption. Of course we may, with Deutsch and Wallace, 
eventually be in a position to make statements about the subjective probabilities 
of branches, but if so such statements will have to come at a later stage - after 
establishing the vàlues V(g) of various games. But then how are we to establish 
these vàlues? 

Here Wallace has provided a considerably more detailed analysis than Deutsch, 
and from weaker premises. But the proofs are correspondingly more compli- 

8 This does not mean that subjective probabilities are illusory, and corrcspond to nothing in 
reality. The point is to legitimate the concept, not to abolish it. As for its objective correlate, 
the most popular candidate has long been relative frequency (of outcomes in a sequence of 
trials). Relative freqüències are obviously important when it comes to evidence for proba- 
bilities, but there are well-known difficulties with trying to identify them with probabilities 
(for anything short of infinite sequences). We read Everett as making a contrary proposal: 
that the objective corrclates of subjective probability arc branches in the universal state (with 
respect to the decoherence basis). Here we are deducing the quantitativc rule to bc used in 
assigning subjective probabilities to branches. 
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cated; for the sake of simplicity we shall only consider Deutsch's argument, 
removing the ambiguities of notation in the way shown by Wallace. 

First, consider Case 1, the Stern-Gerlach experiment. All is in order up to 
Eq.(ll), but we must do without the assumption subsequently made - that the 
registered outcome íí(+) results with probability w, and outcome fi(— ) with 
probability 1 — w. Here Deutsch invokes a new principle, what he calls the zero- 
sum rule: 

V[p,X,Q] = -V[p,X,-n]. (24) 

Following Deutsch, let us assume that the numerical value of the utility O(Afe) 
equals Afe. Then, in the special case where Ai = — A2 (true for the measurement 
of a component of spin), from Eq.(24), applied to Eq.(ll), we deduce: 

V[c 1 ip 1 + c 2 ip 2 ,a z ,tt] = -V[c 1 ip 1 + c 2 <p 2 ,d z ,fl} (25) 

and hence that V[ciip 1 + c 2 ip 2 , t? z , fi] = 0, in accordance with the Born rule in 
this special case. 

Although evidently of limited generality, the result is illustrative - assuming 
the zero-sum rule can be indepcndcntly justificd. (Of course it follows trivially 
from Eq.(2), but this was derived from Eq.(l), and at this point we cannot 
make use of the concept of probability.) Here is an argument: banking too 
is a form of gambling; the only diffcrcncc bctween acting as the gambler who 
bets, and as the banker who accepts the bet, is that whereas the gambler pays 
a stakc in order to play, and receives payoffs according to the outeomes, the 
banker receives the stake in order to play, and pays the payoffs according to 
the outeomes. The zero-sum rule is the statement that the most that one will 
pay in the hope of gaining a utility is the least that one will accept to take the 
risk of losing it. We may take it that this principle, as a principle of zero-sum 
games, is perfectly securc. And evidently any quantum experiment can be used 
to play a zero-sum game; therefore this principle also applics to the expected 
utility of experiments. 

What of the general equal-norm case, Case 2? Here the zero-sum rule is not 
enough. But if we consider only the case d = 2, it is enough to supplement 
it with another rule, what Deutsch calls the additivity rule. A payoff function 
: R — > U is additive if and only if íl(x + y) = íl{x) + íl(y). Let fk'-R^R 
be the function fk(x) = x + fc; then V is additive if and only if 

v[ip,x,nof k ] = v[ip,x,n] + n(k). (26) 

Additivity of the payoff function is a Standard assumption of elementary decision 
theory, eminently vàlid for small bets (but hardly vàlid for large ones, or for 
utilities that only work in tàndem) . Additivity of V then has a clear rational: 
it is an example of a sure-thing principle, that if, given two games, each exactly 
the same, except that in one of them one receives an additional utility O(fc) 
whatever the outcome, then one should value that game as having an additional 
utility n(k). 



14 



To see how additivity can be used in Case 2 (but restricted to d = 2) , observe 
that for k = —X± — X2, the function —lo f k is the permutation ir. Therefore 
from Eq.(14) we may conclude: 

V[^,X,Ü] = V[^,X,Vto-Iof k \. (27) 

By additivity the RHS is V[ip, X, o —I] + (fi o —I)(k), and since Í2 is additive 
(so íl o —I = — 0) we obtain, from the zero-sum rule 

V[ip,X,íï]=-V[tl),X,íï]-Sl(k). (28) 
With a further application of payoff additivity there follows 

v[ip,x,n] = ^[n(x 1 ) + n(x 2 )} (29) 

in accordance with the Born rule. 

As Wallace has shown, this, along with the higher dimensional cases (d > 2), 
can be derived from much weaker axioms of decision theory, that do not assume 
additivity. Theorem 5 then goes through unchanged. 9 As already remarked, 
one is then in a position to derive the extension to the irrational case without 
assuming continuity: for the details, I refer to Wallace [12]. 

Decision theory can evidently play a role in the derivation of the Born rule, 
but it is only ncedcd if the notion of probability is itself in need of justification. 
That may well be so, in the context of the Everett interpretation; but on other 
approaches to quantum mechanics, probability, whatever it is, can be taken as 
given. 

12 Gleason's Theorem 

Compare Gleason's theorem: 

Theorem 8 Let f be any function from 1- dimensional projections on a Hilbert 
space of dimension d > 2 to the unit interval, such that for each resolution of 
the identity {Pk}, k = 1, d, X^fe=i Pk — I> Sfe=i f(Pk) = 1- Then there exists 
a unique density matrix p such that f{Pk) = Tr(pPk). 

Proof. Glcason (1967) ■ 

A first point is that the derivation of the Born rule presented here concerns 
the notion of a fixed algorithm that applies to arbitrary measurement mod- 
els, hence to Hilbert spaces of arbitrary dimension, whercas Gleason's theorem 
concerns an algorithm that applies to arbitrary resolutions of the identity on a 
Hilbert space of fixed dimension. Although the proof of Theorems 5 and 7 made 

9 It is worth remarking that a derivation of the Born rule for initial states that are not 
normalizcd to unity is just what is needed for the Everett interpretation, as also the de 
Broglie-Bohm theory (in rcality, according to cither approach, one always deals with branch 
amplitudes with modulus strictly less than one - supposing the initial state of the universe 
has modulus one). 
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use of a Hilbert space of large dimensionality, it applics to the 2-dimensional 
case as well. 

More important, on a variety of approaches to quantum mechanics, nothing 
so strong as Gleason's premise is really motivated. It is not requircd that prob- 
abilities can be defincd for a projector independent of the family of projectors 
of which it is a member. This requirement, sometimes called non-contextuality 
[8], is very strong. Very few approaches to quantum mechanics subscribe to it. 
The theorem has no relevance to any approach that singles out a unique basis 
once and for all: it applies neither to the GRW theory [5] , nor to the de Broglie- 
Bohm theory [6], which singlc out the position basis; it does not apply to the 
Everett interpretation [?], which singles out a basis approximatcly localized in 
phase space; it does not apply to the consistent histories approach [3], assum- 
ing the choice of decoherent history space is unique. All these theories require 
only that probabilities be defined for projectors associated with the preferred 
basis - if they apply to any other resolution of the identity, it is insofar as in 
a particular context, experimental or othcrwise, the latter projections become 
correlated with members of the former family. 

But so much is entirely compatible with the derivation that we have offered. 
By all means restrict Dcfinition 1 to observables compatible with a unique res- 
olution of the identity (and likewise the consisteney condition of Dcfinition 2). 
Lemma 3 proves identities for expectation vàlues for commuting observables, it 
likewise can be restricted to a unique resolution of the identity; likewise Theo- 
rem 4. In Theorem 5 an auxiliary basis was used, but again this can again be 
taken as the preferred basis. And whilst it is In the spirit of Theorem 7 that 
probabilities should also be defincd for small variations in projectors, this does 
not yet amount to the assumption of non-contextuality. 

Unlike the premise of Gleason's theorem, the operational criteria that we 
have used are hardly disputed; they are common ground to all the major schools 
of foundations of quantum mechanics. But it would be wrong to suggest that 
they apply to all of them equally: on some approaches - in particular, those that 
provide a detailed dynamical model of measurements - there is good reason to 
suppose that an algorithm for expectation vàlues will depend on additional 
factors (in particular, on the state at the instant of state reduction); the Born 
rule may no longer be forced in consequence. (But we take it that this would 
be an unwelcome consequence of these approaches; the Born rule will have to 
be otherwise justified - presumably, as it is in the GRW theory, as a hypothesis). 

Of the major schools, two - the Everett interpretation, and those based 
on operational assumptions (here we include the Copenhagen interpretation) 
- offer no such resources. This point is clear enough in the latter case; in 
the case of the Everett interpretation, the association of models with múltiple 
channcl experiments as given in Dcfinition 1 follows from the full theory of 
measurement. 10 Quantum mechanics under the Everett interpretation provides 
no leeway in this matter. The same is likely to be true of any approach to 

10 For arguments in the still more general case, on applying the Everett theory of measure- 
ment to any experiment, I refer to Wallacc [12] (sec in particular his principle of "measurement 
ncutrality" ) . 
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quantum mechanics that preserves the unitary formalism intact, without any 
supplcmcnt to it. 

The principal remaining schools have a rather different status. One, the 
state-reduction approach, has already been remarked on: a new and detailed 
dynamical theory of measurement is likely to offer novel definitions of experi- 
mental models and novel criteria for whcn they are to be applied. The other is 
the hidden-variable approach, in which the state evolves unitarily even during 
measurements (but is incomplete). This case deserves special consideration. 

13 Completeness 

As it happens, the one approach to foundations in which the Born rule has been 
seriously questioned is an example of this type (the de Broglie-Bohm theory) [11]. 
Hidden variables certainly make a difference to the argument we have presented. 
Considcr the proof of Theorem 4. The passage from Eq.(13) to Eq.(14) hinged 
on the fact that the state on both sides of Eq.(13) is identical when the norms of 
its components are the same. (Likcwise the step from Eq.(lO) to (11).) But if 
the state is incomplete, this is not enough to ensure the required identification. 
Including the state of the hidden variables as well (denote w), we should replace 
tp by the pair < ip,u> > (w may be the value of the hidden variable, or a 
probability distribution over its vàlues). Doing this, as Wallace has pointed 
out [12], there is no guarantee that in the case of superpositions of equal norms 
- e.g. for V = ^75 (^í + ^2)1 where ip l7 tp 2 are, as in Case 1, eigenstates of the 

z— component of spin - that U„ (permuting ip 1 and (p 2 ) will act as the identity. 
Although U n ip — -0, its action on < íp,u> > may well be different from the 
identity; how is the permutation to act on the hidden variables? 

The question is clearer when U v implements a spatial transformation. We 
have an example where it does: the Stern-Gerlach experiment. In this case 
UirCTzU^ 1 = —o z , a reflection in the x — y plane. Under the latter, a particle 
initially with positive z-coordinatc (lu = +) is mappcd to one with negative 
z-coordinate (to = — ). Under this same transformation, the superposition ijj = 
■^(<Pi +^2) is unchanged. Therefore U v :< ip,+ >—>< ip,— >^=< ip,+ >; 
there is no longer any reason to suppose that Eq.(ll) will be satisfied. 

This situation is entirely as expected. In the de Broglie-Bohm theory, given 
such an initial state if>, it is well known that if the incident particle is located 
on one side of the plane of symmetry of the Stern-Gerlach apparatus, then it 
will always remain there. It is obvious that if the partides is always located on 
the same side of this plane, on repetition of the experiment, the statistics of the 
outeomes will disagree with the Born rule. It is equally clear that if partides are 
randomly distributed about this plane of symmetry then the Born rule will be 
obeyed - but that is only to say that the probability distribution for the hidden 
variables is determined by the state, in accordance with the Born rule. This is 
what we are trying to prové. 

But it does not follow that the arguments we have given have no bearing on 
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such a thcory. Our strategy, recali, was to derive constraints on an algorithm - 
any algorithm - that takes as inputs experimental models and yields as outputs 
expectation vàlues. The constraints will apply even if the state is incomplete, 
even if there are additional parameters controlling individual mcasuremcnt out- 
comes - so long as the state alone determines the statistical distribution of the 
hiddcn variables. Given that, then any symmetries of the state will also be 
symmetries of the distribution of hiddcn variables. In application to the de 
Broglie-Bohm theory, our result indeed implies that the particle distribution 
must be given by the Born rule - this is no longer an additional postulatc of 
the theory - so long as the particle distribution is determined only by the state. 
The assumption is not that partides must be distributed in accordance with the 
Born rule, but that they are distributed by any rule at all that is determined 
by the state. Then it is the Born rule. 
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